A Machine Learning Approach Towards Anomalous Traffic Detection in Web Applications

Gonzalo De La Torre, Vivek Sarkale
*Open Cloud Institute, University of Texas at San Antonio, San Antonio, Texas, USA*
gonzalo.delatorreparra@utsa.edu, prn180@utsa.edu

Project Definition

The following project proposes the development of an anomaly-based intrusion detection system by creating a machine learning model using normal traffic which is sent from a client web browser to a web server. Traffic is monitored at the application layer (http protocol) and identifies anomalous traffic when new http requests deviate by a defined threshold set in the model.

Intrusion Detection Systems (IDS) are used to analyze network traffic to detect malicious actions or behaviors that can compromise sensible data or the security of a computer system. Typically, they are classified as signature-based (negative approach) or anomaly based (positive approach). Signature Detection System compare signatures of incoming traffic with signatures of known attacks saved in a database. On the other hand, Anomaly Detection Systems build a model of network traffic based on what is considered normal traffic. Afterwards, the model is used to monitor incoming traffic and any traffic deviating from the “normal” behavior is classified as anomalous.

As we want to be capable of detecting new attacks, the intrusion detection system to be developed will be anomaly-based.

Outcome

An RNN model will developed by training a series of network traffic sessions incoming from a client web browser and each session containing multiple requests of normal traffic. The RNN model will then be tested against normal and abnormal traffic identifying if the monitored traffic is normal or abnormal.

Dataset

This project uses the HTTP CSIC 2010 dataset developed at the "Information Security Institute" of CSIC (Spanish Research National Council). The dataset contains thousands of HTTP labeled requests targeted to an e- Commerce website. In these requests, users add items to shopping cart, register and provide personal information. In total, the dataset contains more than 36,000 normal requests and 25,000 anomalous requests.

The anomalous requests carry static attacks, dynamic attacks, and unintentional illegal requests that were generated using Paros and W3AF. The following presents a list of attacks that can be found in the dataset:

Obsolete file existence
Default file or example file existence
HTTP method validity
CRLF injection
Failure to restrict URL access
Invalid parameters
Command injection
Cross site scripting
SQL injection
Buffer overflows
Broken authentication and session management
Broken access control
Remote administration flaws
Web application and server misconfiguration
Malicious file execution
Insecure direct object reference
Information leakage and improper error handling

Dataset source: http://www.isi.csic.es/dataset/

University of Texas at San Antonio

Machine Learning/BigData EE-6973-001-Fall-2016

A Machine Learning Approach Towards Anomalous Traffic Detection in Web Applications

Project Definition

Outcome

Dataset